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ABSTRACT 

Using 16,068 articles in Wikipedias Medicine Wikipro- 
ject, we study the relationship between collaboration and 
quality. We assess whether certain collaborative patterns 
are associated with information quality in terms of self- 
evaluated quality and article viewership. We find that 
the number of contributors has a curvilinear relation- 
ship to information quality, more contributors improving 
quality but only up to a certain point. Other articles that 
its collaborators work on also influences the quality of an 
information artifact, creating an interdependent network 
of artifacts and contributors. Finally, we see evidence of 
a recursive relationship between information quality and 
contributor activity, but that this recursive relationship 
attenuates over time. 



INTRODUCTION 

A new generation of IT-enabled collaborative tools - 
such as wikis, blog communities, and electronic social 
networks — enable people to share and create knowledge 
in new and potentially more powerful ways. These tools 
transcend traditional limitations and enable collective 
collaboration across conventional organizational bound- 
aries. Nevertheless, the mere presence of these tools does 
not ensure effective collaboration or the creation of valu- 
able knowledge. People and organizations must use these 
tools effectively to generate valuable outcomes. 

Most of the extant research on these types of IT-enabled 
collaboration focuses on independent peer production 
communities working together to produce a single infor- 
mation artifact such as an open source software product 
or a single Wikipedia article. This research overlooks the 
fact that these collaborative environments often produce 
multiple information artifacts concurrently, and contrib- 
utors may transfer knowledge from one artifact to an- 
other. Collaboration associated with the production of 
one information artifact may not be independent of the 
collaboration associated with other information artifacts 
on a shared collaboration platform. Understanding how 
collaboration occurs on one peer-produced information 
artifact can be important for understanding the quality 
of the other artifacts produced on the shared platform. 

Wikipedia is becoming an increasingly important source 
of information for the general public, and it provides an 
excellent forum in which to examine collaborative prac- 
tices (Kane & Fichman 2009). We examine the collab- 
orative activities that occur between 16,068 articles and 
40,479 contributors in the Wikipedia Medicine Wikipro- 
ject. We investigate whether the quality of an article is 



associated with collaborative activity of its contributors 
to other Wikipedia articles. Furthermore, we examine 
whether there is a recursive relationship between infor- 
mation quality and the contributions an information ar- 
tifact receives. 

In general, we find support for our hypotheses. The col- 
laborative processes that produce information artifacts 
on IT-enabled collaborative platforms are not indepen- 
dent from one another. Instead, their development is 
also influenced by the work of contributors on other in- 
formation artifacts on the platform. Furthermore, we 
find a recursive relationship between information qual- 
ity and collaboration. More contributors create better 
information artifacts which in turn attracts more con- 
tributors, but this relationship attenuates over time. 

Future research examining the quality of peer produced 
information may be well-served by considering the inter- 
connection between collaborative projects and the dy- 
namics of that collaboration over time. 



THE ROLE OF INFORMATION ARTIFACTS 

Researchers have historically conceptualized IT-enabled 
collaborative environments as a network, examining how 
the structural features of those networks are associ- 
ated with information benefits provided by that network 
(Ahuja, Galletta & Carley 2003, Wasko & Faraj 2005). 
An important difference between these previous collabo- 
rative environments and newer generations of IT-enabled 
collaborative tools is that people often use these emer- 
gent platforms to create peer-produced information ar- 
tifacts, such as shared wiki articles, blog posts and com- 
ments, online videos and ratings, or interactive profile 
pages on electronic social network platforms. 

These information artifacts preserve and extend the work 
of individual contributors and create a whole that is fun- 
damentally different from the sum of its parts or the in- 
tentions of original contributors. A good example is how 
these information artifacts are often co-opted following a 
contributors death (Cohen 2009), when they cease being 
an outlet for the individual and become a place to com- 
memorate and memorialize its original author. Thus, 
the peer-produced information artifacts may be used in- 
dependently of the individuals who contributed to them 
or the purposes for which they were contributed. As 
such, these information artifacts may influence and be 
influenced by the contributors that work on them and 
thus can be considered as independent entities within 
the collaborative platform. 
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As contributors work on multiple information artifacts 
over time, they transfer information and knowledge con- 
tained in one artifact to another to improve the infor- 
mation quality of the recipient artifact. Three types of 
information and knowledge found in one artifact might 
be helpful to the development of another. Contributors 
can transfer content from one artifact to another that 
can be used to improve the information quality of the 
focal artifact in a number of ways. First, the content 
found in one information artifact may simply be used 
to improve the content of another. Second, contributors 
can also transfer process information from one artifact 
to another to improve the quality of the focal artifact. 
Based on their experience working on one information 
artifact, contributors may have learned effective ways to 
collaborate with others using the IT-enabled collabora- 
tive platform. For instance, if an individual has been 
involved in a large number of conflicts in other commu- 
nities, s/he may have gained valuable insight on how to 
handle similar types of conflict in other communities. 
Finally, contributors can transfer reputational informa- 
tion about other contributors and how they contribute 
to (or detract from) the effective development of other 
information artifacts. If one contributor has a reputation 
for high-quality work, other contributors may be more 
willing to trust their insight than another contributor 
who is relatively unknown. 

If information artifacts have an identity independent 
from the contributors who create it and if the contrib- 
utors can transfer information gained from one artifact 
to another, the result is an interconnected network of 
contributors and information artifacts. The structure of 
this network might have significant implications for the 
quality of the information on the platform. Social Net- 
work Analysis (SNA) has been adopted by the organiza- 
tional literature as a productive approach for studying 
these types of interconnected collaborative environments 
(Cross' & Prusak 2002, Borgatti & Cross 2003, Reagans 
& McEvily 2003, Cummings 2004, Ransbotham, Kane 
& Lurie 2012). 



Network Structure and Information Quality 

Three aspects of the collaboration that occurs in IT- 
enabled collaborative environments have been theorized 
as related to the quality of information it produces 
(Constant, Sproull & Kicslcr 1996). 

First, the number of contributors is often associated with 
the quality of information. Attracting a sufficient num- 
ber of contributors is important for collaborative user- 
generated content. More contributors increase the effort 
and energy dedicated to creating content and provides 
a broader array of knowledge and abilities for content 
creation. This should increase the value of collabora- 
tive user-generated content. Research on prediction mar- 
kets, virtual teams, and social networks suggests that the 
quality of aggregate information, number of ideas gener- 
ated, and likelihood of a valuable answer increases with 



the number of participants (Constant et al. 1996, Mar- 
tins, Gilson & Maynard 2004, Foutz & Jank 2010). 

At the same time, other research suggests that having 
too many contributors can also be problematic. After a 
certain point, the marginal cost of adding new members 
exceeds its marginal value. Consistent with the adage 
"too many cooks spoil the stew," an excessive number 
of contributors negatively influence the value of user- 
generated content. As the number of contributors grows, 
the marginal value of additional contributors decreases 
while the cognitive and coordination costs associated 
with contributions increases (Asvanund, Clay, Krishnan 
& Smith 2004, Jones, Ravid & Rafaeli 2004, Ransbotham 
& Kane 2011). In particular, those involved in the co- 
creation of content are likely to suffer from information 
overload as they try to make sense of and respond to 
others contributions. 

This rationale suggests that a curvilinear relationship 
between number of contributors and information quality. 
The most valuable collaborative user-generated content 
is generated when enough contributors are attracted to 
sustain production but not so much that it creates infor- 
mation overload for contributors. Considerable empiri- 
cal evidence supports such curvilinear relationships be- 
tween number of contributors and outcomes in online col- 
laborative groups (Butler 2001, Hansen & Haas 2001, As- 
vanund et al. 2004, Oh & Jeon 2007, Ransbotham & 
Kane 2011). Similar relationships have also been found 
in traditional organizations. For instance, software de- 
velopment teams often need sufficient resources to ac- 
complish their goals, but adding more members to a 
troubled or delayed project can compound delays by in- 
creasing coordination costs (Brooks 1975), often expo- 
nentially, as new members are added (Espinosa, Slaugh- 
ter, Kraut & Herbsleb 2007). Thus, we expect a curvi- 
linear relationship between the number of contributors 
and the quality of collaborative user-generated content. 
These ideas lead to our first hypothesis: 

Hypothesis 1. The quality of a peer-produced infor- 
mation artifact will be curvilinearly (inverted-U) related 
to the number of contributors to the artifact. 

Second, greater diversity of information sources pro- 
vided by an IT-enabled collaborative environment im- 
proves the quality of information it produces (Constant 
et al. 1996). Additional contributors may not be par- 
ticularly valuable if they provide the same information 
already possessed by other contributors, but they are 
valuable when they provide access to information not 
already possessed by existing contributors (Burt 1997, 
Uzzi 1997, Kane & Alavi 2007). 

The number of different information artifacts that con- 
tributors work on reflects the diversity of information 
available to the information artifact on a mass collabo- 
ration. It can reflect both the knowledge directly avail- 
able for transfer from other information artifacts or it 
may simply serve to reflect the underlying knowledge 
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possessed by the individual contributors. The diversity 
of knowledge possessed by individual contributors on an 
IT-enabled collaborative platform may be revealed in the 
pattern of other artifacts the contributors work on. For 
instance, contributors with deep, specialized knowledge 
may work intensely in a few communities with related 
purposes; whereas contributors with broad, more gener- 
alized knowledge may work more superficially on a broad 
range of other artifacts. 

Further, these patterns reveal the type of knowledge pos- 
sessed by the contributor. For example, in the wake 
of the Virginia Tech Massacre, contributors reported 
very different reasons for contributing to the related 
Wikipedia article (Kane & Fichman 2009). Some con- 
tributors did so because of their knowledge of the school, 
some because they had knowledge and interest regarding 
the relevant gun control issues, and still others because 
they were skilled copyeditors. The first type of contrib- 
utor may also contribute to articles on Virginia or other 
colleges, the second type might also contribute to other 
gun-related topics, and the third type may contribute to 
a diverse range of articles of a particular length or stage 
of development. Thus, the other information artifacts 
on which a contributor works on reflects the underly- 
ing knowledge and/or topical interests possessed by that 
contributor. We hypothesize that the number of different 
information artifacts on which a contributor in a peer- 
production community also works reflects the diversity 
of information sources available to the community. 

Hypothesis 2. The quality of a peer-produced infor- 
mation artifact will be positively related to the number 
of other information artifacts on which its contributors 
work. 

The depth of resources available in a collaborative envi- 
ronment will also be related to the quality of information 
it produces (Constant et al. 1996). Even if a collabora- 
tive environment provides access to a large number and 
to a diverse range of information sources, some sources 
have deeper and more valuable resources than others. 
Certain contributors simply provide access to greater in- 
formation resources, either as a result of their connection 
to information artifacts with more resources or as a result 
of the underlying resources possessed by that individual. 
Access to deeper resources generated in more active peer- 
production environments positively relates to the quality 
of information produced by the community. In many IT- 
cnablcd collaborative platforms, a relatively small per- 
centage of information artifacts accounts for a relatively 
large amount of the collaborative activity that occurs on 
it (Kuk 2006). Information artifacts that are the source 
of more collaborative activity are deeper sources of valu- 
able content, process, and reputational information. 

Similarly, individuals who are more influential contrib- 
utors to artifacts that host abundant collaborative ac- 
tivity may also reveal the underlying depth of resources 
possessed by the individual. IT-enabled collaborative 
platforms typically employ limited hierarchical and ad- 



ministrative structures, if they possess any at all (Butler, 
Joyce & Pike 2008). Individuals who emerge as promi- 
nent contributors do so largely because members of the 
community recognize them as valuable contributors. In 
peer-produced information artifacts, someone can con- 
tribute heavily only by the consent of other contribu- 
tors. If other contributors do not approve of an individ- 
uals contributions, they will either resist them, forcing 
the unwelcome contributor to relent or leave (Kane, Ma- 
jchrzak, Johnson & Chen 2009); or else the other contrib- 
utors will leave, as they are no longer receiving benefits 
from the collaborative community (Butler 2001). 

Thus, within a IT-enabled collaborative platform, the 
activity level of contributors on prominent information 
artifacts reflects the depth of resources available to a peer 
production community. More active information com- 
munities are the source of deeper content, process, and 
reputational information than less active communities. 
Peer-produced artifacts with access to deeper resources 
are more likely to produce higher quality information. 

Hypothesis 3. Depth of collaborative activity that 
occurs within an information artifacts collaborative net- 
work will be positively related to artifact quality. 

While we hypothesize above that collaboration leads to 
improved quality of the information artifact, it is also 
possible that the quality of the information artifact will 
also lead to certain types of collaborative behavior (Kane 
et al. 2009). As articles become of higher quality, they 
are more likely to attract interest from outsiders who 
seek to access that information, either for the content or 
as a collaborative exemplar. In open collaborative envi- 
ronments, all viewers of an article are also potential con- 
tributors. Higher quality information may, therefore, at- 
tract more viewers, introducing the possibility that these 
viewers in turn become collaborators. Thus, while we 
have hypothesized that certain collaborative structures 
lead to improved information quality, it is also possible 
that improved information quality will in turn lead to 
certain collaborative patterns. Thus, we hypothesize a 
recursive relationship between the quality of an article 
and the collaboration it generates. 

Hypothesis 4. There will be a recursive, positive re- 
lationship between information quality and the collabora- 
tion that occurs on an information artifact. 

RESEARCH METHOD AND SETTING 

To test our hypotheses, we employ social network anal- 
ysis (SNA). Social network analysis is capable of ex- 
amining more complex networks comprising different 
types of nodes (Wasserman & Faust 1994). A tra- 
ditional but infrequently-used network conceptualiza- 
tion is known as the two-mode network (Borgatti & 
Everett 1997, Faust 1997). A two- mode network is a 
general network structure consisting of two fundamen- 
tally different types of entities that cannot be examined 
cquivalently with one another. Here, we conceptualize 
our two-mode network as consisting of the information 
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artifacts and individual collaborators as nodes; editing 
activities are the ties that connect them, due to the 
transfer information and knowledge from one artifact to 
another. 

We use two different network measures to operationalize 
our two remaining hypotheses — degree centrality and 
eigenvector centrality. These centrality measures are of- 
ten used in conjunction to capture the features of the 
local (degree) and the global (eigenvector) social net- 
work (Friedkin 1991, Faust 1997). Degree centrality in 
the two-mode network is used to measure the diversity of 
information sources available to an information artifact 
through its contributors. Eigenvector centrality captures 
the depth of information sources available to an infor- 
mation artifact. This measure summarizes the node's 
centrality in the global network of all of the nodes and 
ties that compose the network. 

We focus empirical analysis on the 16,068 articles within 
the Medicine Wikiproject in Wikipedia. A Wikiproject 
is a group of contributors dedicated to develop, maintain, 
and organize articles related to a particular topic. We 
focus on a single Wikiproject, because a random sample 
of articles would not likely yield the social network fea- 
tures of theoretical interest and a Wikiproject provides 
clearly defined boundaries. 

Data Collection 

We downloaded the full text history of 2,029,443 revi- 
sions of 16,068 articles by 40,479 unique contributors in 
the Medicine Wikiproject as of June 2009, which resulted 
in a 50 GB data set of raw data. We employed a 70-node 
Linux cluster to allow for simultaneous downloads and 
processing of these extensive data. For each contribu- 
tion, we record the contributors identity, the changes 
made, a description of the change, and the time of the 
change. 

Dependent Variables 

We use two different measures to evaluate the quality 
of an article. First, we assess self-evaluated quality. 
The Medicine Wikiproject evaluates articles on a 7-point 
scale (Stub, Start, C, B, Good, A, Featured). We re- 
cruited two fourth year medical school students to in- 
dependently validate the quality of a subsample of 120 
randomly selected articles. Each student independently 
evaluated each article, then ratings were compared and 
reconciled to create a single reviewer rating. These rec- 
onciled ratings were then compared to the ratings as- 
signed by the Wikiproject Medicine. These students 
reached an 85% interrater reliability with one another, 
and the reconciled ratings achieved a 90% agreement 
with the ratings of information quality assigned by the 
Wikiproject. These results suggest the self-evaluated 
quality was a good proxy for the overall quality of the 
article. 

Second, we also use public-evaluated quality in order 
to provide an independent and finer-grained measure of 
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quality that provided richer data for testing Hypothe- 
sis [4] Here, we operationalizcd information quality as 
the number of times an article has been viewed, which 
is an indication of how this information is valued by the 
public (Ransbotham et al. 2012). For each article, we 
collected the number of views each day from December 
2007 until June 2009; these data are not available for the 
entire history of Wikipedia. We summarized the view 
counts by month. 

ANALYSIS AND RESULTS 

The full dataset was then analyzed using Stata. Or- 
dinal regression is appropriate when there is a progres- 
sive relationship within a categorical dependent variable, 
but it is unclear the magnitude of different between the 
categories. For instance, the observer may know which 
Olympic athletes have won the gold, silver, and bronze 
medals without knowing the final scores of any of the 
athletes. This method is most appropriate for our mea- 
sure of self-evaluated quality. Table [1] describes the full 
results of an ordinal logistic regression on self-evaluated 
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Table 2. Three Stage Least Squares Model of Article Views 

Model Variable Model 1 Model 2 
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124,711 observations; standard errors in parentheses; significance *p < 0.05, **p < 0.01, ***p < 0.001. 
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quality. Model presents our results for the baseline 
model with only control variables, and Model 1 presents 
the results of our models with the variables of interest. 
(We also tested each independent variable of interest in- 
dependently, and results are consistent with the compos- 
ite findings in Model 1.) 

Examining Hypothesis [U both the linear and square 
coefficients are significant (/? = 0.269, p < 0.001 and 
/3 = -0.186, p < 0.001, respectively). These coeffi- 
cients indicate an inverted-U relationship with article 
quality. Additional contributors working on an infor- 
mation artifact increase quality up to an optimal point, 
but then additional contributors detract from the qual- 
ity of information found in the artifact. We also find 
support for the second hypothesis that diversity of re- 
sources increases artifact quality. The coefficient on de- 
gree centrality per contributor is positive and significant 
(J3 = 0.f68, p < O.OOf). The more diverse the content, 
process, and reputational knowledge contributors access 
and/or represent, the higher quality the information ar- 
tifact is likely to be. Hypothesis[3]that depth of recourses 
available to an information artifact will be positively re- 
lated to information quality is also supported. The coef- 
ficient on eigenvector centrality is positive and significant 
(/3 = 0T05, p < O.OOf). The greater depth of resources 
that an artifact can access through or is represented by 
its contributors, the higher quality the artifact is likely 
to be. 

Table [5] describes the results of a simultaneous equation, 
three stage least squares regression on the natural log of 
article views (scaled by f ,000 for presentation) using the 
sample of f24,7f f monthly observations of articles from 
December 2007 until June 2009. Model f introduces the 
focal network variables. Because of the large sample size, 
we use a low threshold of statistical significance (p < 
0.001) to test our hypotheses. 

We find partial support for Hypothesis 2) The number 
of unique contributors to user-generated content has a 
curvilinear relationship with article views. Both the lin- 
ear and squared coefficients are significant (/3 = f2.28, 
p < O.OOf and j3 = -6.78, p < O.OOf, respectively). 
These coefficients indicate an inverted-U relationship 
with article views. Additional contributors working on 
an article increase its quality up to an optimal point, 
but then detract from the ability of the article to attract 
viewers. We also considered models that utilized either 
a linear effect of contributors or a log of the number of 
contributors; based on the Akaike information criterion 
(AIC); the quadratic model provides a slightly better fit 
(decreases in AIC of 18 and 7 respectively). 

However, we do not find that the quality of user- 
generated content is positively related to local network 
centrality; the coefficient for degree centrality per con- 
tributor is negative and significant ((3 — —1.99, p < 
O.OfO). We do find that the quality of user-generated 
content is positively related to global network centrality; 
the coefficient for eigenvector centrality is positive and 



significant (/? = 3.27, p < O.OOf). As hypothesized, both 
models demonstrate a recursive effect of article view- 
ership on the number of contributors. In Equation 2, 
the coefficient for article views is significant and positive 
(J3 = 0.10, p < 0.001). More contributors lead to greater 
viewing, but more viewing also yields a greater number 
of contributors. The protect variable, used for identifi- 
cation of the simultaneous model since it affects contri- 
butions but not viewing, is also significant (/? = —5.79, 
p < O.OOf) and behaves as expected. When an article has 
restrictions placed on who can contribute, significantly 
fewer people contribute to it. 

It is interesting to note that age has the opposite ef- 
fect in the contributor model — age is positively related 
to viewing but negatively related to the overall number 
of contributors. This suggests that collaborative user- 
generated content matures and stabilizes over time; more 
people come to view older content but they are less likely 
to contribute to that content. It may be that more ma- 
ture content attracts a more general audience that is less 
likely to have the knowledge or inclination to contribute, 
or it may be that the viewers of the content find it to be 
relatively complete and feel they have nothing to add to 
improve it. 

CONCLUSION 

In this paper, we test the influence of the network struc- 
ture created by contributors and information artifacts on 
information quality in peer-produced information. We 
find good general support for our hypotheses, determin- 
ing that the collaboration occurring on one information 
artifact can influence the quality of the other informa- 
tion artifacts on which those collaborators work. Fur- 
thermore, we also demonstrate a recursive relationship 
between contributors and information quality in terms 
of viewership. More viewers brings more collaborators 
which brings more collaborators, but this recursive re- 
lationship attenuates over time. Implications are that 
researchers should broaden their understanding of how 
collaboration on other information artifacts can influ- 
ence information quality and begin understanding peer- 
production settings as a network of knowledge. 
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